Quentin Lhoest's picture

Quentin Lhoest PRO

lhoestq

·

AI & ML interests

Maintainer of 🤗Datasets: NLP, Multimodal data processing and sharing

Recent Activity

updated a dataset about 5 hours ago

infinite-dataset-hub/LegalCasePrecedent

published a dataset about 5 hours ago

infinite-dataset-hub/LegalCasePrecedent

updated a dataset about 13 hours ago

infinite-dataset-hub/OnlinePaymentFraud

View all activity

Organizations

lhoestq's activity

upvoted a collection about 16 hours ago

BioReason

BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model • 3 items • Updated 1 day ago • 5

upvoted 2 collections about 23 hours ago

ConTEB training datasets

Training data for the InSeNT method. • 3 items • Updated 1 day ago • 1

ConTEB evaluation datasets

Evaluation datasets of the ConTEB benchmark. Use "test" split where available, otherwise "validation", otherwise "train". • 8 items • Updated 1 day ago • 1

upvoted an article about 23 hours ago

Article

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

By

and 1 other •

1 day ago

• 13

upvoted a collection 1 day ago

Comma v0.1 Artifacts

A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 2 items • Updated 2 days ago • 2

upvoted an article 8 days ago

Article

Interactive Tools for machine learning, deep learning, and math

By

•

8 days ago

• 40

upvoted 2 articles 10 days ago

Article

Introducing Gradio's new Dataframe!

By

and 1 other •

Mar 24

• 27

Article

Tiny Agents in Python: a MCP-powered agent in ~70 lines of code

By

and 3 others •

12 days ago

• 117

upvoted 2 changelogs 11 days ago

Changelog

Static Spaces can now have a build step

11 days ago

• 80

Changelog

Xet is now the default storage option for new users and organizations

11 days ago

• 52

upvoted an article 14 days ago

Article

NVIDIA Cosmos Now Available On Hugging Face For Physical AI Reasoning

By

and 1 other •

15 days ago

• 24

upvoted a paper 17 days ago

LightLab: Controlling Light Sources in Images with Diffusion Models

Paper • 2505.09608 • Published 20 days ago • 31

upvoted a paper 18 days ago

SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Paper • 2505.02370 • Published 29 days ago • 14

upvoted an article 18 days ago

Article

The Transformers Library: standardizing model definitions

By

and 3 others •

20 days ago

• 109

upvoted an article 20 days ago

Article

Highlights from the First ICLR 2025 Watermarking Workshop

By

and 4 others •

20 days ago

• 10

upvoted an article 23 days ago

Article

LeRobot Community Datasets: The “ImageNet” of Robotics — When and How?

By

and 6 others •

24 days ago

• 52

upvoted an article 27 days ago

Article

AI Personas: The Impact of Design Choices

By

and 1 other •

27 days ago

• 14

upvoted 3 collections 27 days ago

Hugging Face community’s Wikimedia datasets

Wikimedia datasets created by the Hugging Face community, not Wikimedia. Sorted by Wikimedia project. • 17 items • Updated Jun 7, 2024 • 11

SwallowMath

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 11 items • Updated 27 days ago • 3

SwallowCode

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code • 66 items • Updated 27 days ago • 3